AITopics | conditional gradient

Collaborating Authors

conditional gradient

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Heavy Ball Momentum for Conditional Gradient

Neural Information Processing SystemsDec-24-2025, 18:33:14 GMT

Conditional gradient, aka Frank Wolfe (FW) algorithms, have well-documented merits in machine learning and signal processing applications. Unlike projection-based methods, momentum cannot improve the convergence rate of FW, in general. This limitation motivates the present work, which deals with heavy ball momentum, and its impact to FW. Specifically, it is established that heavy ball offers a unifying perspective on the primal-dual (PD) convergence, and enjoys a tighter \textit{per iteration} PD error rate, for multiple choices of step sizes, where PD error can serve as the stopping criterion in practice. In addition, it is asserted that restart, a scheme typically employed jointly with Nesterov's momentum, can further tighten this PD error bound. Numerical results demonstrate the usefulness of heavy ball momentum in FW iterations.

conditional gradient, heavy ball momentum, name change, (4 more...)

Neural Information Processing Systems

Industry: Leisure & Entertainment > Sports > Tennis (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.80)

Add feedback

Adaptive Conditional Gradient Descent

Khademi, Abbas, Silveti-Falls, Antonio

arXiv.org Machine LearningOct-14-2025

Selecting an effective step-size is a fundamental challenge in first-order optimization, especially for problems with non-Euclidean geometries. This paper presents a novel adaptive step-size strategy for optimization algorithms that rely on linear minimization oracles, as used in the Conditional Gradient or non-Euclidean Normalized Steepest Descent algorithms. Using a simple heuristic to estimate a local Lipschitz constant for the gradient, we can determine step-sizes that guarantee sufficient decrease at each iteration. More precisely, we establish convergence guarantees for our proposed Adaptive Conditional Gradient Descent algorithm, which covers as special cases both the classical Conditional Gradient algorithm and non-Euclidean Normalized Steepest Descent algorithms with adaptive step-sizes. Our analysis covers optimization of continuously differentiable functions in non-convex, quasar-convex, and strongly convex settings, achieving convergence rates that match state-of-the-art theoretical bounds. Comprehensive numerical experiments validate our theoretical findings and illustrate the practical effectiveness of Adaptive Conditional Gradient Descent. The results exhibit competitive performance, underscoring the potential of the adaptive step-size for applications.

algorithm, artificial intelligence, machine learning, (13 more...)

arXiv.org Machine Learning

2510.1144

Country:

Asia > Middle East > Iran > Tehran Province > Tehran (0.04)
Europe > Russia (0.04)
Europe > France (0.04)
Asia > Russia (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.81)

Add feedback

Scalable DC Optimization via Adaptive Frank-Wolfe Algorithms

Pokutta, Sebastian

arXiv.org Artificial IntelligenceAug-5-2025

We consider the problem of minimizing a difference of (smooth) convex functions over a compact convex feasible region $P$, i.e., $\min_{x \in P} f(x) - g(x)$, with smooth $f$ and Lipschitz continuous $g$. This computational study builds upon and complements the framework of Maskan et al. [2025] by integrating advanced Frank-Wolfe variants to reduce computational overhead. We empirically show that constrained DC problems can be efficiently solved using a combination of the Blended Pairwise Conditional Gradients (BPCG) algorithm [Tsuji et al., 2022] with warm-starting and the adaptive error bound from Maskan et al. [2025]. The result is a highly efficient and scalable projection-free algorithm for constrained DC optimization.

algorithm, artificial intelligence, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2507.17545

Country: Europe > Germany (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Blended Conditional Gradients: the unconditioning of conditional gradients

Braun, Gábor, Pokutta, Sebastian, Tu, Dan, Wright, Stephen

arXiv.org Artificial IntelligenceMar-21-2025

We present a blended conditional gradient approach for minimizing a smooth convex function over a polytope P, combining the Frank--Wolfe algorithm (also called conditional gradient) with gradient-based steps, different from away steps and pairwise steps, but still achieving linear convergence for strongly convex functions, along with good practical performance. Our approach retains all favorable properties of conditional gradient algorithms, notably avoidance of projections onto P and maintenance of iterates as sparse convex combinations of a limited number of extreme points of P. The algorithm is lazy, making use of inexpensive inexact solutions of the linear programming subproblem that characterizes the conditional gradient approach. It decreases measures of optimality (primal and dual gaps) rapidly, both in the number of iterations and in wall-clock time, outperforming even the lazy conditional gradient algorithms of [arXiv:1410.8816]. We also present a streamlined version of the algorithm for the probability simplex.

artificial intelligence, iteration, machine learning, (18 more...)

arXiv.org Artificial Intelligence

1805.07311

Country:

North America > United States > Wisconsin > Dane County > Madison (0.14)
North America > United States > Georgia > Fulton County > Atlanta (0.04)
Europe > Russia (0.04)
Asia > Russia (0.04)

Genre: Research Report (1.00)

Industry: Government > Military (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.66)

Add feedback

Heavy Ball Momentum for Conditional Gradient

Neural Information Processing SystemsJan-18-2025, 18:22:37 GMT

conditional gradient, heavy ball momentum, iteration, (1 more...)

Neural Information Processing Systems

Industry: Leisure & Entertainment > Sports > Tennis (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.88)

Add feedback

Stochastic Constrained Decentralized Optimization for Machine Learning with Fewer Data Oracles: a Gradient Sliding Approach

Nguyen, Hoang Huy, Li, Yan, Zhao, Tuo

arXiv.org Artificial IntelligenceApr-3-2024

In modern decentralized applications, ensuring communication efficiency and privacy for the users are the key challenges. In order to train machine-learning models, the algorithm has to communicate to the data center and sample data for its gradient computation, thus exposing the data and increasing the communication cost. This gives rise to the need for a decentralized optimization algorithm that is communication-efficient and minimizes the number of gradient computations. To this end, we propose the primal-dual sliding with conditional gradient sliding framework, which is communication-efficient and achieves an $\varepsilon$-approximate solution with the optimal gradient complexity of $O(1/\sqrt{\varepsilon}+\sigma^2/{\varepsilon^2})$ and $O(\log(1/\varepsilon)+\sigma^2/\varepsilon)$ for the convex and strongly convex setting respectively and an LO (Linear Optimization) complexity of $O(1/\varepsilon^2)$ for both settings given a stochastic gradient oracle with variance $\sigma^2$. Compared with the prior work \cite{wai-fw-2017}, our framework relaxes the assumption of the optimal solution being a strict interior point of the feasible set and enjoys wider applicability for large-scale training using a stochastic gradient oracle. We also demonstrate the efficiency of our algorithms with various numerical experiments.

algorithm, complexity, gradient, (15 more...)

arXiv.org Artificial Intelligence

2404.02511

Country:

North America > United States > Georgia > Fulton County > Atlanta (0.14)
North America > United States > New York > New York County > New York City (0.04)
Asia > Japan > Honshū > Tōhoku > Fukushima Prefecture > Fukushima (0.04)

Genre: Research Report (0.82)

Industry: Information Technology (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)

Add feedback

Projection-free Distributed Online Learning with Strongly Convex Losses

Wan, Yuanyu, Wang, Guanghui, Zhang, Lijun

arXiv.org Machine LearningMar-20-2021

To efficiently solve distributed online learning problems with complicated constraints, previous studies have proposed several distributed projection-free algorithms. The state-of-the-art one achieves the $O({T}^{3/4})$ regret bound with $O(\sqrt{T})$ communication complexity. In this paper, we further exploit the strong convexity of loss functions to improve the regret bound and communication complexity. Specifically, we first propose a distributed projection-free algorithm for strongly convex loss functions, which enjoys a better regret bound of $O(T^{2/3}\log T)$ with smaller communication complexity of $O(T^{1/3})$. Furthermore, we demonstrate that the regret of distributed online algorithms with $C$ communication rounds has a lower bound of $\Omega(T/C)$, even when the loss functions are strongly convex. This lower bound implies that the $O(T^{1/3})$ communication complexity of our algorithm is nearly optimal for obtaining the $O(T^{2/3}\log T)$ regret bound up to polylogarithmic factors. Finally, we extend our algorithm into the bandit setting and obtain similar theoretical guarantees.

algorithm, communication complexity, inequality, (13 more...)

arXiv.org Machine Learning

2103.11102

Country:

Asia > China > Jiangsu Province > Nanjing (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Israel > Jerusalem District > Jerusalem (0.04)

Genre: Research Report (0.64)

Industry: Education > Educational Setting > Online (0.61)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Enterprise Applications > Human Resources > Learning Management (0.61)

Add feedback

Robust Structured Statistical Estimation via Conditional Gradient Type Methods

Zhuo, Jiacheng, Liu, Liu, Caramanis, Constantine

arXiv.org Machine LearningJul-7-2020

Structured statistical estimation problems are often solved by Conditional Gradient (CG) type methods to avoid the computationally expensive projection operation. However, the existing CG type methods are not robust to data corruption. To address this, we propose to robustify CG type methods against Huber's corruption model and heavy-tailed data. First, we show that the two Pairwise CG methods are stable, i.e., do not accumulate error. Combined with robust mean gradient estimation techniques, we can therefore guarantee robustness to a wide class of problems, but now in a projection-free algorithmic framework. Next, we consider high dimensional problems. Robust mean estimation based approaches may have an unacceptably high sample complexity. When the constraint set is a $\ell_0$ norm ball, Iterative-Hard-Thresholding-based methods have been developed recently. Yet extension is non-trivial even for general sets with $O(d)$ extreme points. For setting where the feasible set has $O(\text{poly}(d))$ extreme points, we develop a novel robustness method, based on a new condition we call the Robust Atom Selection Condition (RASC). When RASC is satisfied, our method converges linearly with a corresponding statistical error, with sample complexity that scales correctly in the sparsity of the problem, rather than the ambient dimension as would be required by any approach based on robust mean estimation.

artificial intelligence, machine learning, rasc, (14 more...)

arXiv.org Machine Learning

2007.03572

Country: North America > United States > Texas > Travis County > Austin (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Inexact and Stochastic Generalized Conditional Gradient with Augmented Lagrangian and Proximal Step

Silveti-Falls, Antonio, Molinari, Cesare, Fadili, Jalal

arXiv.org Machine LearningMay-11-2020

In this paper we propose and analyze inexact and stochastic versions of the CGALP algorithm developed in the authors' previous paper, which we denote ICGALP, that allows for errors in the computation of several important quantities. In particular this allows one to compute some gradients, proximal terms, and/or linear minimization oracles in an inexact fashion that facilitates the practical application of the algorithm to computationally intensive settings, e.g. in high (or possibly infinite) dimensional Hilbert spaces commonly found in machine learning problems. The algorithm is able to solve composite minimization problems involving the sum of three convex proper lower-semicontinuous functions subject to an affine constraint of the form $Ax=b$ for some bounded linear operator $A$. Only one of the functions in the objective is assumed to be differentiable, the other two are assumed to have an accessible prox operator and a linear minimization oracle. As main results, we show convergence of the Lagrangian to an optimum and asymptotic feasibility of the affine constraint as well as weak convergence of the dual variable to a solution of the dual problem, all in an almost sure sense. Almost sure convergence rates, both pointwise and ergodic, are given for the Lagrangian values and the feasibility gap. Numerical experiments verifying the predicted rates of convergence are shown as well.

artificial intelligence, lemma 4, machine learning, (18 more...)

arXiv.org Machine Learning

2005.05158

Country:

Europe > France (0.04)
North America > United States > New York (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.63)

Industry: Education (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.68)

Add feedback

Locally Accelerated Conditional Gradients

Carderera, Alejandro, Diakonikolas, Jelena, Pokutta, Sebastian

arXiv.org Machine LearningJun-18-2019

Conditional gradient methods form a class of projection-free first-order algorithms for solving smooth convex optimization problems. Apart from eschewing projections, these methods are attractive because of their simplicity, numerical performance, and the sparsity of the solutions outputted. However, they do not achieve optimal convergence rates. We present the Locally Accelerated Conditional Gradients algorithm that relaxes the projection-freeness requirement to only require projection onto (typically low-dimensional) simplices and mixes accelerated steps with conditional gradient steps to achieve local acceleration. We derive asymptotically optimal convergence rates for this algorithm. Our experimental results demonstrate the practicality of our approach; in particular, the speedup is achieved both in wall-clock time and per-iteration progress compared to standard conditional gradient methods and a Catalyst-accelerated Away-Step Frank-Wolfe algorithm.

algorithm, artificial intelligence, optimization problem, (14 more...)

arXiv.org Machine Learning

1906.07867

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.66)

Add feedback